refactor rl config: 1) remove unnecessary params; 2) model_config consistency #926

Hanjun-Dai · 2025-12-20T23:45:15Z

There are actor_model_config, reference_model_config and rollout_model_config where there value might conflict with each other, if user don't overwrite them all explicitly. The benefits are:

Consistency

Current

where user overwrites the model to qwen-0.6b but the rest of the configs still show llama-8b, which is very confusing.

After this patch

Reusing checks in cli/config.py

A lot of checks / validations are based on model_config. Right now it seems for RL the config is set through reference_model_config. So to make it consistent, we can overwrite the model_config when no explicit configuration are set to model_config.

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and all unit tests pass.
I have added all appropriate doc-strings/documentation.
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have signed the Contributor License Agreement.
I have followed Contribution Guidelines.

wang2yn84 · 2025-12-23T02:18:20Z

Thank you for your PR! Can you add the correspondent test cases to tunix/cli/config_test.py? Thank you!

wang2yn84 · 2025-12-23T02:19:50Z

Also, can you provide the command to generate the "current" model config? Do you overwrite the model of the base config?

…config for consistency

Hanjun-Dai · 2025-12-25T07:34:58Z

Thank you for your PR! Can you add the correspondent test cases to tunix/cli/config_test.py? Thank you!

working on it!

Hanjun-Dai · 2025-12-25T07:37:14Z

Also, can you provide the command to generate the "current" model config? Do you overwrite the model of the base config?

so the existing scripts should still work. If there's no direct modification to model_config (e.g., through cli), then model_config will take the value from the reference_model_config (reference_model_config is what's being used right now when setting up RL)

Hanjun-Dai · 2026-01-10T03:11:26Z

closing this one as #960 is providing a better solution

Hanjun-Dai requested review from abheesht17, hgao327, jiangyangmu, lc5211, sizhit2, tianshub and wang2yn84 as code owners December 20, 2025 23:45

refactor rl config: 1) remove unnecessary params; 2) overwrite model_…

4eaa49e

…config for consistency

Hanjun-Dai force-pushed the clean_rl_config_cli branch from 4e2e19b to 4eaa49e Compare December 25, 2025 07:18

wang2yn84 mentioned this pull request Jan 10, 2026

Fix the cli to have the proper model config over written. #960

Merged

6 tasks

Hanjun-Dai closed this Jan 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor rl config: 1) remove unnecessary params; 2) model_config consistency #926

refactor rl config: 1) remove unnecessary params; 2) model_config consistency #926

Uh oh!

Hanjun-Dai commented Dec 20, 2025

Uh oh!

wang2yn84 commented Dec 23, 2025

Uh oh!

wang2yn84 commented Dec 23, 2025

Uh oh!

Hanjun-Dai commented Dec 25, 2025

Uh oh!

Hanjun-Dai commented Dec 25, 2025

Uh oh!

Hanjun-Dai commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor rl config: 1) remove unnecessary params; 2) model_config consistency #926

refactor rl config: 1) remove unnecessary params; 2) model_config consistency #926

Uh oh!

Conversation

Hanjun-Dai commented Dec 20, 2025

Consistency

Reusing checks in cli/config.py

Uh oh!

wang2yn84 commented Dec 23, 2025

Uh oh!

wang2yn84 commented Dec 23, 2025

Uh oh!

Hanjun-Dai commented Dec 25, 2025

Uh oh!

Hanjun-Dai commented Dec 25, 2025

Uh oh!

Hanjun-Dai commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants